Corpus

Column

Introduction

The corpus that will be used for this project is the top 50 of multiple countries on Spotify that has an available top 50. This is an interesting corpus because it can show the differences and similarities between popular music in many different places across the world. The comparison points can be either countries, regions, or continents. This flexibility is another advantage of the corpus. The limitations of this corpus are the fact that recent releases or events can influence music worldwide and can push music to the top of many top 50 lists while not saying much about the differences between places (such as the recent release of Kanye West having multiple songs in multiple different top 50 lists). Another limitation is the fact that multiple countries might not use Spotify as their main source of music, so this could influence regional trends in a way that makes them not representative of the population of the country as a whole. A final weakness is the fact that this corpus changes daily, so it can be hard to look into individual tracks since they will probably be gone after a certain time. Finally, it is difficult to pick specific interesting tracks in this corpus due to the sheer number of tracks (3600) in it. But some that may be interesting at first glance are CARNIVAL (Kanye West), Cruel Summer (Taylor Swift), Unwritten (Natasha Bedingfield) and I Wanna Be Yours (Arctic Monkeys). This Kanye West song and many others of his (very) recent album are interesting because they can be found in nearly every top 50 list. The other three songs are mostly interesting because they are older songs that are still found in multiple top 50 lists across the corpus.

Outlier analysis

Row

Release date of the songs

Plot A

This plot shows the release date of all songs in the top 50 of all the different regions, I expected to see mostly very new songs with maybe a few outliers. However, surprisingly a lot of songs are from before 2022, some are even as old as before 2000! This plot also shows some regional differences, the Philippines and the UK seem to enjoy older songs the most, with multiple songs from between 1998 to 2014 making their top 50s, the USA seem to enjoy songs from between 2014 to 2022. Morocco, the Netherlands, Brazil and Australia only seem to like newer songs with a handful of exceptions.

Plot B

This plot shows the relation between energy and valence in music from all the different top 50s. It shows that energy and valence are correlated, but not as much as you might expect. It also shows that nearly every popular song has an energy value of 0.4 or higher, those that do not always have a valence level of ~0.5 or lower. This seems to mean that while every “emotional” value is represented in the different top 50s, there does seem to be a minimal amount of energy needed for a song to become popular. A few regional differences can be observed, first of all, Brazil and the UK seem to enjoy more energetic and happier music than average, while the Philippines is the opposite. The other countries all seem to have a similar distribution in energy and valence.

Row

Plot A

Plot B

Regional differences

Regional differences

When looking for regional differences, it is useful to look at the individual features per country first to try and find trends and how countries differ from the norm and what the norm even is.

Acousticness

Acousticness shows an interesting trend, it seems like all the English-speaking countries and the Netherlands prefer less acoustic songs, Morocco likes acoustic music the most by far and their regional music seems to have more acoustic elements which reflects that preference.

Danceability

It seems like there are no strong regional trends in danceability between countries, Morocco has a slightly higher than average danceability preference and Philippines a slightly below average preference for danceability.

Energy

Brazil, the Netherlands and the UK prefer music with more energy, which might mean that South-America and Europe in general like more energetic music. The Philippines is far below average in the energy level of their music, this seems to be a strong regional preference.

Liveness

It seems like live music is not really all that popular in any country, on average very popular music is nearly always very polished and thus not live. Only Brazil seems to have a significant number of tracks that have a higher value in liveness, which says something about the type of music they prefer. Brazil seems to care less about the polish of typical pop music and they may appreciate the more “real” live sound.

Loudness

This graph is very similar to the energy graph, with the same countries on top and at the bottom. One major difference that can be found is that Brazil has a way higher average than second place, in energy the difference was smaller. This again seems to be regional preference.

Speechiness

This graph is very similar to liveness, popular music usually has little to no speech since that is a feature that just doesn’t occur in very popular songs in general. But again there are exceptions, Morocco and Brazil do seem to have a significant amount off tracks that have some amount of speechiness which says something about the regional music they prefer.

Tempo

It’s very interesting to see that there really seems to be an optimal tempo for a song to become popular regardless of region, only in Brazil is the average preferred tempo a bit higher which makes sense when considering their preferrence for louder and more energetic music than average.

Valence

The average valence seems to be at or slightly below zero, which is surprising to me because you would expect popular music to be happier on average since people like music that makes them happy in general. However, Brazil and Morocco break this trend showing a clear preference for happier songs.

Column

Acousticness

Danceability

Energy

Liveness

Loudness

Speechiness

Tempo

Valence

Individual songs

Row

Description

This is a chroma graph of an outlier in the dataset: “Cruel Summer” by Taylor Swift, which is one of the oldest songs which is found in multiple top 50 lists.

Column

Chroma analysis of an outlier

Row

Structure similarity matrix of an outlier

Description

These are two self similarity matrices of “What” by Taylor Swift, these similarity matrices show both Chroma and Timbre.

Row

Chordogram of a pop song

This chordogram shows the chords in “End of Beginning” by Djo, this song shows six clear sections. This is a typical feature of many popular songs in the corpus, because the sections showing the same chords are the chorus returning three times which has historically been a key feature in creating popular and most importantly, catchy songs.

Chordogram

Row

Tempogram

Not all pop is rhytmically simple

This tempogram shows an outlier from the corpus. The tempogram is of the song “Pink + White” by Frank Ocean, which is not only one of the older songs in the corpus but also one of the very few songs that do not have a clear unchanging tempo all the way through the song. Overall Spotify estimates the tempo of this track at 160 BPM, and most online sources agree with this assessment, however the graph also shows activity at multiple other BPM values. This is really interesting because it suggests that the tempo of the song might not be clear to the listener, which, looking at the other song in the corpus, is not a recipe for a successful song. However, “Pink + White” has clearly defied the odds and has been a massively successful regardless, even charting again years after its initial release date.